Skip to main content

View Post [edit]

Poster: brewster Date: Feb 26, 2016 10:56am
Forum: texts Subject: djvu files for new uploads


The Internet Archive will soon stop creating DJVU files for uploaded text files. The reasons for this are declining use, errors in the creation of new files, and the difficultly for our supporting the java viewer.

We will continue to make the _djvu.xml files that are word-based OCR output with coordinates, and we have an improved way of making .txt files that will work in more cases (currently it depends on the djvu file being created, and this has not been very consistent).

We will keep the DJVU files that have been created unless there is some upstream change in that item that causes it to be obsolete.

Thank you to the DJVU community, and LizardTech, for working with the Internet Archive on this.

-brewster
Digital Librarian, Internet Archive

Reply [edit]

Poster: e.a.mora Date: Sep 22, 2018 6:25pm
Forum: texts Subject: Re: djvu files for new uploads

Hello,

How can I create a djvu file from the djvu.xml and/or the other xml files?

Regards.

Reply [edit]

Poster: Jeff Kaplan Date: Sep 23, 2018 2:26pm
Forum: texts Subject: Re: djvu files for new uploads

we not longer support creating djvu file format.

Reply [edit]

Poster: e.a.mora Date: Sep 23, 2018 2:30pm
Forum: texts Subject: Re: djvu files for new uploads

Hello, thanks for your reply.

I understand that you no longer support djvu file creation. There are many books that have _djvu.xml and other xml files available. How can I create a djvu file using those xml files?

Regards.

Reply [edit]

Poster: Jeff Kaplan Date: Sep 24, 2018 12:04pm
Forum: texts Subject: Re: djvu files for new uploads

i wouldn't know the answer to that but an online converter may work.

Reply [edit]

Poster: Alex.brollo Date: Mar 16, 2016 4:20am
Forum: texts Subject: Re: djvu files for new uploads

Mapped djvu text layer is really very interesting and accessible while mapped pdf text layer is not. The alternative could be, to parse _abbyy.xml file (VERY heavy an complex, even if very interesting!) but there's a possible way to get a simpler mapped text output: to add a hOCR to the list of derived files. It would be very interesting to add to hOCR something more than text, t.i. something about recognition quality level and perhaps some text formatting (character weight and style), deriving it from _abbyy.xml data.

An alternative for djvu users is to run pdf2djvu, it fails on many Google pdf pages but I presume that it will never fail on IA pdf files; but sometimes pdf files are a little bit too compressed, and derived djvu can't result of better quality than source pdf.

This post was modified by Alex.brollo on 2016-03-16 11:20:16

Reply [edit]

Poster: kmlyvens Date: Sep 1, 2016 1:44pm
Forum: texts Subject: Re: djvu files for new uploads

It's a pity – I find DJVU files far superior to PDFs in smaller size and embedded OCRed text layer. It's a curious fact that DJVU in Linux renders lightning-fast (just try it!) in both evince/atril (other viewers as well, I guess) and djview, while archive.org's JPEG2000-based PDFs are really, really slow to render each page.

Of course, the Java applet never was any good, especially when considering that Java applet support is being phased out in browsers. And considering that DJVU button has always been linked to the Java applet, not the file itself (unlike PDFs and other formats that link to the file itself), it isn't a wonder that DJVU usage has been declining.

Reply [edit]

Poster: Nemo_bis Date: Mar 5, 2016 12:43pm
Forum: texts Subject: Re: djvu files for new uploads

Thanks. How did you measure the declining use?

Reply [edit]

Poster: Jeff Kaplan Date: Mar 5, 2016 9:50pm
Forum: texts Subject: Re: djvu files for new uploads

i didn't but the folks here who can, did.

Reply [edit]

Poster: Nemo_bis Date: Mar 16, 2016 2:38am
Forum: texts Subject: Re: Statistics on DjVu usage (and Wikimedia)

Ok, it would be very interesting to hear more about it.

DjVu is an important technology for the web and for libraries, and of course for Wikimedia/Wikisource where I come from. For what it's worth, I compiled some statistics for DjVu files on Wikimedia wikis, which largely come from the Internet Archive: in February 2016, DjVu files had 1625 GB downloaded and 13 million accesses, of which only 4 million come from standard thumbnails (e.g. on Wikipedia articles) and over 3 million come from a non-WMF referrer, i.e. presumably from usage all over the web. I don't know if such numbers are relevant for you but they certainly are significant for Wikimedia.

You can find the statistics at 2016-02-mediacounts-djvu-aggregated.ods; they come from mediacounts.

This post was modified by Nemo_bis on 2016-03-16 09:38:08

Reply [edit]

Poster: Jeff Kaplan Date: Mar 16, 2016 4:58pm
Forum: texts Subject: Re: Statistics on DjVu usage (and Wikimedia)

low usage was part of the decison but there is also a trend in browsers to no longer support it. Java Applets are no longer supported by most browsers. Chrome apparently disabled Java applets entirely last year. In both Firefox and Safari, Java is under user control, and can be allowed. But in both those browsers, this particular applet immediately fails with a security warning. and while It does appear that several browser plugins and extensions do exist for viewing DjVu files in-browser it was decided to retire the format.

(corrected typo)

This post was modified by Jeff Kaplan on 2016-03-16 23:58:01

Reply [edit]

Poster: Nemo_bis Date: Mar 16, 2016 3:41pm
Forum: texts Subject: Re: Statistics on DjVu usage (and Wikimedia)

I see. On the other hand, even execution of native PDF plugins is being disabled on some browsers nowadays, and most users of modern OS (like most GNU/Linux distributions) have native viewers to open DjVu automatically once they click a link (which gets downloaded by the browser).